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Abstract 

The average redundancy of the Shannon code, Rn, as a function of the block length n, is 
known to exhibit two very different types of behavior, depending on the rationality or irrational- 
ity of certain parameters of the source: It either converges to 1/2 as n grows without bound, 
or it may have a non- vanishing, oscillatory, (quasi-) periodic pattern around the value 1/2 for 
all large n. In this paper, we make an attempt to shed some insight into this erratic behavior 
of Rn, by drawing an analogy with the realm of physics of wave propagation, in particular, the 
elementary theory of scattering and diffraction. It turns out that there are two types of behavior 
of wave diffraction patterns formed by crystals, which are correspondingly analogous to the two 
types of patterns of Rn- When the crystal is perfect, the diffraction intensity spectrum exhibits 
very sharp peaks, a.k.a. Bragg peaks, at wavelengths of full constructive interference. These 
wavelengths correspond to the frequencies of the harmonic waves of the oscillatory mode of i?„ . 
On the other hand, when the crystal is imperfect and there is a considerable degree of disorder 
in its structure, the Bragg peaks disappear, and the behavior of this mode is analogous to the 
one where i?„ is convergent. 

Index Terms: Lossless source coding, redundancy. Shannon code, scattering, diffraction, Bragg 
peaks, disorder. 



1 Introduction 



The analysis of the average redundancy of lossless codes for data compression schemes is a topic 
that attracted the attention of considerably many researchers throughout the history of Information 
Theory (cf. e.g., [1], [3], [6], [7], [8], [9], [10], [11], [12], [13] and many references therein). 

In [13] Szpankowski has derived the asymptotic behavior of the average redundancy Rn, as a 
function of the block length n, for the Shannon code, the Huffman code, and other codes, focusing 



1 



primarily on the binary memoryless source, parametrized hj p- the probabihty of zero. His analysis 
revealed a rather interesting behavior of especially in the cases of the Shannon code and the 
Huffman code: When a = log2[(l— p)/p] is irrational, then converges to a constant (which is 1/2 
for the Shannon code and 3/2 — 1/ In 2 for the Huffman code) as n ^ oo. On the other hand, when 
a is rational, Rn has a non-vanishing oscillatory term of the form {/3mQn), where /3 = — log2(l — p), 
Too is the denominator of a = io/mo in its representation as the ratio between two integers whose 
greatest common divisor is 1, and (x) = x — lx\ designates the fractional part of a real number 
X. In several places in his paper, Szpankowski describes this behavior of Rn as "erratic" and this 
qualifier is, of course, understandable. 

Our purpose in this paper is to make an attempt to give some insight into this erratic behavior of 
Rn by drawing an analogy with the physics of wave diffraction. From the theory of X-ray scattering 
(see, e.g., [2, Chapter 2], [14]), it is known that if the object that causes the diffraction of an incident 
wave is a perfect crystal, then the intensity profile of the scattered wave (as a function of the 
wavelength or the wave number) exhibits very sharp peaks, known as Bragg peaks, at wavelengths 
that correspond to full coherence, where the optical distance differences to all scattering elements 
(layers of the crystal) are exactly integer multiples of the wavelength. This continues to be the case 
as long as there is enough order in the medium such that all these distances are commensurable 
and therefore have a common divisor (common unit of length) , which can serve as the fundamental 
wavelength. In the realm of the average redundancy analysis, this corresponds to the case where 
a is rational and the fundamental frequency of the oscillatory term {Pmon) of Rn is intimately 
related to the fundamental wavelength at which there is a Bragg peak. On the other hand, when 
the distances are incommensurable, perfect coherence between all scattered waves is not achieved 
at any wavelength and therefore no Bragg peaks are observed. This is the case of strong disorder, 
which in the lossless source coding problem, corresponds to the case of a irrational, where Rn is 
convergent. 

More concretely, the analysis of the scattered wave intensity function is based on a very simple 
model of disorder, which is due to Hendricks and Teller [5] (see also [4]). According to the Hendricks- 
Teller (HT) model, the distances between every two consecutive layers in the solid are selected 
independently at random from a finite set of two or more distances. In the simplest case, where 
there are only two possible distances do and di, with probabilities p and 1—p, this random selection 
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process is analogous to the memoryless binary source of the data compression problem and the 
parameter a of this source plays a role analogous to that of the ratio di/dQ. Thus, a irrational 
means that do and di are incommensurable, which is the case of strong disorder with no Bragg 
peaks and no oscillations in On the other hand when a = di/do is rational, we are in the 
(partially) ordered mode, as described above. 

Prom the pure mathematical point of view, the analogy between the average redundancy prob- 
lem and the diffraction problem is rooted in that at the heart of the analyzes of both problems, 
there is one very simple mathematical fact in common: Given a vector (j)o,Pi, ■ ■ ■ ,Pm-i) of non- 
negative reals summing to unity (probabilities) and a vector (ai, . . . , ccm-i) € IR^"-*^, the complex 
number 

M-l 

C^=Po+ E^'ie'™^ i = V^, m = l,2,3,... (1) 

has a modulus that obviously never exceeds unity, and Cm = 1 (i.e., full coherence between all M 
phasors) is attained for some integer values of m if and only if {aj} are all rational. When this 
is the case, then Cm = 1 for all values of m which are integer multiples of mg, the first positive 
integer m for which maj is integer for all 1 < j < M — 1 at the same time.^ The analogy between 
the Shannon code redundancy analysis and the diffraction patterns under the HT model will center 
around (1) and its two types of behavior depending on the rationality or irrationality of {aj}- 

The remaining part of this short paper consists of two more main sections. For the sake of 
completeness, in Section 2, we summarize the main ingredients of the derivation in [13] (with a 
few shortcuts), emphasizing the use of the simple mathematical fact described in the previous 
paragraph. For reasons of simplicity, we focus on the Shannon code and the derivation specializes 
on the memoryless case. In Section 3, we bring the derivation of the diffraction patterns of the 
HT model, with a focus on the analogy with Section 2. We then describe in detail the mapping 
between the two problems under discussion. Finally, in Section 4 we summarize and conclude, with 
a comments on a possible extension to the Markov case. 
^The previous paragraph refers to the special case M = 2. 
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2 Average Redundancy of the Shannon Code 



Throughout the remaining part of this paper, we use capital letters to designate random variables 
(e.g., Xi) and the corresponding lower case letters to denote specific realizations (e.g., Xi). 

Consider a finite alphabet memorylcss source Xi,X2, ■ ■ ■ with alphabet X = {0, 1, 2, . . . , M — 
1} and symbol probabilities {po,Pi, ■ ■ ■ ,PM-i}- The Shannon code for lossless data compression 
assigns to every source n-tuple x = {xi,X2, ■ ■ ■ , Xn) € a binary codeword of length 

n 

£{x) = \- log P{x)] = \- log n PxA , (2) 

t=i 

where \u] designates the smallest integer not smaller than u. The average redundancy of the 
Shannon code is defined as 

Rn = E {£{X)} - nH (3) 

where 

M-l 

H = - J2 Pj'^'^SPj (4) 

j=0 

is the per-symbol entropy. The derivation of the asymptotic expression for i?„ in [13] can be 
presented (with a few slight shortcuts and modifications) as follows. By using the Fourier series 
expansion of the function (u), according to 

«„ = ^, (5) 

we have the following: 

Rn = E{\^logP{X)]+\ogP{X)} 

= l-E{-logP{X)-l-logP{X)\} 

= l-E{-logPiX)) 

= 1 - J i - ^ a„ exp [-27rim log P{X)] 

= I +12 amE{exp[-2TrimlogPiX)]} 



m^O XeX" \t=l ) 



-27rzm^logPa;4 
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m^O 056^;" t=l 



mT^O V i=0 



2 + X) am n I Y P^t [-27^^mlogpa;J 



m^O t=l \xt=0 





-| iW — 1 



M-l 



(6) 




Denoting aj = log(po/Pj)) i = 1; 2, . . . , M — 1, the expression in the square brackets is exactly Cm 
as was defined in (1). The behavior of i?„ for large n is then as follows. If {aj} are not all rational, 
then \Cm\ < 1 for all m, and so, lim„_>.ooC^ = 0, which causes the entire summation over m to 
vanish for large n. In this case, -Rn — ^ 1/2 as n ^ oo. On the other hand, if {aj} are all rational, 
then there exists an integer m such that maj arc all integers. Let mo be the smallest positive 
integer with this property. Then all other integers with the same property are integer multiples of 
mo. Consequently, lim„_>.oo = 1 whenever m is an integer multiple of mo and lim„_^oo = 
otherwise. Thus, denoting /3 = — logpoj we now have for large n. 



where the second line holds since is inversely proportional to m (see (5) above) and in the third 
line we used again (5) with u = p-mon. As can easily be seen from the second line of (7), for large 
n, the sequence Rn is harmonic with a fundamental frequency ujq = 27rmo/?. In other words, the 
Fourier transform of {Rn} contains Dirac delta functions at integer multiples of ujq (modulo 27r). 
We will see later on that these spectral spikes are analogous to the Bragg peaks of the HT model. 

At this point, a technical comment is in order. At first glance, it may seem that the above 
approximate expression of Rn is assymetric with respect to permutations of the alphabet, because 
P was defined as — logpo and the choice of the symbol x = as having a special role in the last 
line of (6) was completely arbitrary (we could have chosen, of course, any other symbol j as well). 



R. 



"n 





(7) 
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However, note that {Pmon) = (— monlogpo) is identical to {—monlogpj) for all j = 1, . . . , M — 1 
because in the rational case considered above, the numbers {— monlogPj j^Q^ differ from each 
other by integers, and therefore their fractional parts are all the same. Thus, the above expression 
of Rn is, in fact, invariant to permutations of the alphabet. 

3 Diffraction Patterns of the HT Model 

The simplest way to think of the HT model is as a one-dimensional model of an alloy, which is 
characterized by a sequence of mass points, positioned along the real line at random locations 
Zo,Zi,. . . , Zn-i- The ensemble of the HT model is defined in terms of the spacings Aj = Zj — 
Zj-i, j = 1, 2, . . . , n — 1, which are n — 1 i.i.d. random variables taking on values in a finite set 
{do, di, . . . , dM-i} with probabilities Po,pi, ■ ■ ■ ,Pm-i, respectively (thus, Zq, Zi, . . . is a random 
walk). Each point Zi contributes a scattered wave described by the phasor e~'^^J , where in the one- 
dimensional setting considered here, q can be understood as the wave number, that is, q = 2-jt/\, 
where A is the wavelength. Assuming the same amplitudes at all points, the superposition of all 
these contributions is then the sum U{q) = J2j i which can be interpreted as the Fourier 

transform of the function u{z) = 5{z — Zj). The overall intensity of this superposition of waves 
is designated by the structure function [2, Chapter 2] 

I{q) = E{\U{q)\'^} = E j^e'i^^''^'^ 1 =^E{e'i^^^-^'^}, (8) 

[ k,i J k,e 

where the expectation is with respect to the random variables {Zj}. 

The derivation of I{q) is fairly simple (see, e.g., [4]) and it is brought here for the sake of 
completeness. 

I{q) = E{e'i^^>'-^'^} 

k,e 

k>e k<e 

= n + J2 E{e''i^^''-^'^} + J2 E{e-''i^^^-^''^} 
k>e k>e 

^ n + Io{q) + I^{q) (9) 
where Io{q) is defined as the second term of the third line and /qI^) is complex conjugate of 
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Io{q). Now, 



Jq{Zk-Zi) 



} 



s=e+i 



k>i 



s=e+i 



M-l 
j=0 



k-l 



= E 

k>e 

n—l 

= T.i^-r)[C{q)Y, 



(10) 



r=l 



where we have denoted 



M-l 

C{q) = E Pj 

j=0 

For n large, whenever |C(q)| < 1, the last expression is dominated by the term nJ2'^i[C(q)] 



(11) 



nC(q)/[l — C{q)], which together with the two other terms of (9), yields 



I{q) Ri n 1 + 



C{q) 



+ 



C*iq) 



1-C{q) 1-C*{q) 



n ■ 



^-\C{q)? 
\l-C{q)r 



or equivalently. 



lim 



I{q) l-|C(g)p 



(12) 



(13) 



n-^oo n |1 — C(g)p' 
If there are values of q for which \C{q)\ = 1, yet C{q) ^ 1, then the geometric series diverges at 

these points, but these are only points of removable discontinuity in I{q) because for every other 

point, arbitrarily close to such a discontinuity point, again \C{q)\ < 1, and the geometric series 

converges. The real problematic points, if any, are those where C{q) = 1 if they exist. For C{q) = 1, 

we have to re-derive the expression of I{q) separately, which is very simple as I{q) is just the sum 

of I's, namely, I{q) = . In other words, the intensity scales quadratically rather than linearly 

with ra, which means that these are extremely high peaks in /(g), namely, the Bragg peaks. 

For C(g) to take the value 1 for some g, the products qdj must all be integer multiples of 27r. 



Suppose that q is such that qd^ = 27rm for some integer m, i.e., q = qm 
we shall denote C{qm) by Cm, as before. In this case, 

M-l 

Cm=PO+ Y^Pi^^"'""''""- 



27rm/do, in which case 



(14) 
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But this is again exactly the expression in (1), this time with aj = dj/do, which as mentioned ear her, 
may assume the value 1, for some integer values of m, if and only if aj = dj/do are all rational, 
or equivalently, do,di,. . . ,dM-i are commensurable. When this is the case, then as before, there 
exists an integer m for which mdj/do are all integers simultaneously. Analogously to the derivation 
in Section 2, let mo be the smallest integer with this property. Then, the Bragg peaks appear at 
wave-numbers qkmo, k = 1,2, . . which correspond to wavelengths Xo/k, where Aq = do/mo- 

The analogy between the two settings is now clear: The memoryless source of Section 2 is 
parallel to the random selection process in the HT model. The parameters aj = log(po/Pj) of the 
source are analogous to distance ratios dj/do, i = 1; 2, . . . , M — 1. Their rationality /irrationality 
dictates the mode of behavior in both problems. The integer parameter uiq is then defined in both 
settings in the very same way. The partially ordered mode in the diffraction model is parallel to 
the oscillatory mode of i?„ in the data compression problem, and the Bragg peaks at all harmonics 
of the fundamental wave-number = 27rmo / do correspond to all harmonics of the fundamental 
frequency ujo = 27r^mo in the oscillatory component of Rn- In other words, the parameter P is 
conjugate, in this sense, to 1/do- 

4 Conclusion 

In this short paper, we have made an attempt to provide some insight into the erratic behavior of the 
redundancy pattern of the Shannon code for lossless data compression. The insight we propose is 
rooted in the physical point of view, where the two modes of the behavior of the redundancy patterns 
are respectively analogous to partial order and complete disorder of a wave diffraction medium, 
which dictates the existence or non-existence of Bragg peaks pertaining to perfectly constructive 
interference. It is hoped that this physical insight contributes to the intuitive understanding of the 
redundancy of the Shannon code and perhaps other codes as well. 

Finally, we comment that the above analyses are, in principle, generalizable to the finite- 
state Markov case (and indeed, Markov models have been proposed in the diffraction setting too 
[5], [14]). When it comes to the Markov case, then both in the data compression problem and in 
the HT model, the role played by high powers of Cm is essentially replaced by high powers of state 
transition probability matrix whose entries are weighted by the appropriate complex exponentials 
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(which depend on m). What matters then are the eigenvalues of this matrix. More concretely, it 
is not difficult to see that the spectral radius, in both settings, never exceeds unity. In the data 
compression problem, the critical behavior is dictated by the existence or non-existence of integer 
values {m} for which the spectral radius is exactly 1. When such values of m exist, then Rn has an 
oscillatory behavior. In the diffraction problem, the distinction between the two types of behavior 
is dictated by the existence of values of m for which one of the eigenvalues is exactly equal to one. 
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